Using EASI Sentinel-1 data¶

This notebook will demonstrate how to load and use Sentinel-1 data generated in EASI.

Adapted from https://github.com/csiro-easi/eocsi-hackathon-2022/blob/main/tutorials/SAR_data.ipynb

Introduction to Sentinel-1¶

Sentinel-1 carries a synthetic aperture radar (SAR) sensor, which can observe our planet even through cloud cover. Sentinel-1's SAR sensor is an active sensor; it emits light in the microwave range of the electormagnetic spectrum, and measures the amount that returns, known as backscatter. Smooth surfaces, like calm water, have very low backscatter - most of the light bounces away from the sensor (known as specular reflection). For rough surfaces, like dirt or vegetation, light is scattered in all directions, producing small backscatter signals (known as diffuse backscatter). Large, human-made structures, which feature both vertical and horizonal smooth surfaces, produce large backscatter signals (known as double bounce). As such, the intensity of Sentinel-1 backscatter can distinguish water from land, as shown in the image below:

Radarsat-2 image

In the image, the river appears dark (specular reflection), with the urban area appearing very bright (double bounce). For more information, see the article from GIS Geography on SAR.

Set up¶

Import required packages and functions¶

In [1]:
# Basic plots
%matplotlib inline
# import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]

# Common imports and settings
import os, sys
from pathlib import Path
from IPython.display import Markdown
import pandas as pd
pd.set_option("display.max_rows", None)
import xarray as xr

# Datacube
import datacube
from datacube.utils.rio import configure_s3_access
from datacube.utils import masking
from datacube.utils.cog import write_cog
# https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Tools
from dea_tools.plotting import display_map
from dea_tools.datahandling import mostcommon_crs

# EASI tools
repo = f'{os.environ["HOME"]}/easi-notebooks'  # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiNotebooks, xarray_object_size, init_dask_cluster

# Data tools
import hvplot.xarray
import cartopy.crs as ccrs
import numpy as np
from dask.diagnostics import ProgressBar
from scipy.ndimage import uniform_filter, variance
from skimage.filters import threshold_minimum

# Dask
from dask.distributed import Client
from dask_gateway import Gateway
/env/lib/python3.10/site-packages/dea_tools/plotting.py:38: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd

EASI environment¶

In [2]:
this = EasiNotebooks('csiro')

family = 'sentinel-1'
product = this.product(family)
display(Markdown(f'Default {family} product for "{this.name}": [{product}]({this.explorer}/products/{product})'))

Default sentinel-1 product for "csiro": asf_s1_grd_gamma0

Dask and ODC¶

In [3]:
# Start dask cluster - this may take a few minutes
cluster, client = init_dask_cluster()
display(cluster)

# ODC
dc = datacube.Datacube()

# List measurements
dc.list_measurements().loc[[product]]
Creating new cluster. Please wait for this to finish.
VBox(children=(HTML(value='<h2>GatewayCluster</h2>'), HBox(children=(HTML(value='\n<div>\n<style scoped>\n    …
Out[3]:
name dtype units nodata aliases flags_definition spectral_definition add_offset scale_factor
product measurement
asf_s1_grd_gamma0 vh vh float32 intensity 0 [gamma0_vh] NaN NaN NaN NaN
vv vv float32 intensity 0 [gamma0_vv] NaN NaN NaN NaN
angle angle float32 degrees 0 [localincidenceangle, local_incidence_angle] NaN NaN NaN NaN

Choose an area of interest¶

In [25]:
# Set your own latitude / longitude

# Perth, WA
# latitude = (-34, -31)
# longitude = (114, 118)
# time = ('2020-01-12', '2020-01-13')  # S1B_IW_GRDH_1SDV_20200112T213428_20200112T213453_019789_0256A8_7757.zip
# time = ('2021-07-17', '2021-07-18')  # S1B_IW_GRDH_1SDV_20210717T213438_20210717T213503_027839_03526B_6421.zip

# Cairns, Qld
latitude = (-18.5, -14.5)
longitude = (144, 146.5)
# time = ('2020-01-18', '2020-01-19')  # S1A_IW_GRDH_1SDV_20200118T195200_20200118T195225_030859_038A8E_E46C.zip
time = ('2021-07-11', '2021-07-12')  # S1A_IW_GRDH_1SDV_20210711T195210_20210711T195235_038734_04921C_BC5C.zip

# Surat Basin, Qld
# latitude = (-29, -25)
# longitude = (149, 153)
# time = ('2022-01-16', '2022-01-17')  # S1A_IW_GRDH_1SSH_20220116T083316_20220116T083341_041483_04EED2_1DBE.zip

# Melbourne Vic
# latitude = (-39, -37)
# longitude = (143, 147)
# time = ('2020-01-15', '2020-01-16')  # S1A_IW_GRDH_1SDV_20200115T193320_20200115T193345_030815_038904_6D44.zip

display_map(longitude, latitude)
/env/lib/python3.10/site-packages/dea_tools/plotting.py:313: FutureWarning: This function is deprecated. See: https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1
  all_longitude, all_latitude = transform(Proj(crs), Proj('EPSG:4326'), all_x,
Out[25]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load data¶

In [26]:
data = dc.load(
    product = product, 
    latitude = latitude,
    longitude = longitude,
    time = time,
    dask_chunks = {'latitude':2048, 'longitude':2048},      # Dask chunk size. Requires a dask cluster (see the "tutorials/dask" notebooks)
    group_by = 'solar_day',                    # Group by day method
)

display(data)

display(f'Number of bytes: {data.nbytes}')
display(xarray_object_size(data))
<xarray.Dataset>
Dimensions:      (time: 1, latitude: 20000, longitude: 12500)
Coordinates:
  * time         (time) datetime64[ns] 2021-07-11T19:51:57.500000
  * latitude     (latitude) float64 -14.5 -14.5 -14.5 ... -18.5 -18.5 -18.5
  * longitude    (longitude) float64 144.0 144.0 144.0 ... 146.5 146.5 146.5
    spatial_ref  int32 4326
Data variables:
    vh           (time, latitude, longitude) float32 dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
    vv           (time, latitude, longitude) float32 dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
    angle        (time, latitude, longitude) float32 dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 1
    • latitude: 20000
    • longitude: 12500
    • time
      (time)
      datetime64[ns]
      2021-07-11T19:51:57.500000
      units :
      seconds since 1970-01-01 00:00:00
      array(['2021-07-11T19:51:57.500000000'], dtype='datetime64[ns]')
    • latitude
      (latitude)
      float64
      -14.5 -14.5 -14.5 ... -18.5 -18.5
      units :
      degrees_north
      resolution :
      -0.0002
      crs :
      EPSG:4326
      array([-14.5001, -14.5003, -14.5005, ..., -18.4995, -18.4997, -18.4999])
    • longitude
      (longitude)
      float64
      144.0 144.0 144.0 ... 146.5 146.5
      units :
      degrees_east
      resolution :
      0.0002
      crs :
      EPSG:4326
      array([144.0001, 144.0003, 144.0005, ..., 146.4995, 146.4997, 146.4999])
    • spatial_ref
      ()
      int32
      4326
      spatial_ref :
      GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
      grid_mapping_name :
      latitude_longitude
      array(4326, dtype=int32)
    • vh
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
      units :
      intensity
      nodata :
      0
      crs :
      EPSG:4326
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 0.93 GiB 16.00 MiB
      Shape (1, 20000, 12500) (1, 2048, 2048)
      Dask graph 70 chunks in 1 graph layer
      Data type float32 numpy.ndarray
      12500 20000 1
    • vv
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
      units :
      intensity
      nodata :
      0
      crs :
      EPSG:4326
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 0.93 GiB 16.00 MiB
      Shape (1, 20000, 12500) (1, 2048, 2048)
      Dask graph 70 chunks in 1 graph layer
      Data type float32 numpy.ndarray
      12500 20000 1
    • angle
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 2048, 2048), meta=np.ndarray>
      units :
      degrees
      nodata :
      0
      crs :
      EPSG:4326
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 0.93 GiB 16.00 MiB
      Shape (1, 20000, 12500) (1, 2048, 2048)
      Dask graph 70 chunks in 1 graph layer
      Data type float32 numpy.ndarray
      12500 20000 1
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-07-11 19:51:57.500000'], dtype='datetime64[ns]', name='time', freq=None))
    • latitude
      PandasIndex
      PandasIndex(Float64Index([           -14.5001,            -14.5003,            -14.5005,
                               -14.5007,            -14.5009,            -14.5011,
                               -14.5013,            -14.5015,            -14.5017,
                               -14.5019,
                    ...
                               -18.4981,            -18.4983,            -18.4985,
                               -18.4987,            -18.4989,            -18.4991,
                    -18.499299999999998,            -18.4995,            -18.4997,
                               -18.4999],
                   dtype='float64', name='latitude', length=20000))
    • longitude
      PandasIndex
      PandasIndex(Float64Index([          144.0001,           144.0003, 144.00050000000002,
                              144.0007,           144.0009,           144.0011,
                    144.00130000000001,           144.0015,           144.0017,
                              144.0019,
                    ...
                              146.4981,           146.4983,           146.4985,
                              146.4987,           146.4989,           146.4991,
                              146.4993,           146.4995,           146.4997,
                              146.4999],
                   dtype='float64', name='longitude', length=12500))
  • crs :
    EPSG:4326
    grid_mapping :
    spatial_ref
'Number of bytes: 3000260012'
'Dataset size: 2.79 GB'

Plot the data¶

In [27]:
def plot_band(band, clim=None, frame_height=500):
    if not clim:
        clim = (-0.01, 0.5)
    x = data[band].hvplot(
        groupby='time', rasterize=True,
        cmap="Greys_r", clim=clim,
        geo=True, crs=ccrs.PlateCarree(), projection=ccrs.PlateCarree(),
        title = f'Measurement: {band.upper()}', frame_height=frame_height,
    )
    return x

vv_plot = plot_band('vv')
vh_plot = plot_band('vh', clim=(-0.01,0.1))

# Display next to each other with axes linked (default)
vv_plot + vh_plot
Out[27]:

Make an RGB image¶

For an RGB visualization we use the ratio between VH and VV.

In [28]:
# Add the vh/vv band
data['vh_vv'] = data.vh / data.vv

# Scale the measurements by their median so they have a similar range for visualization
med = data / data.median(dim=['latitude','longitude'])

# Create an RGB array, and persist it on the dask cluster
rgb_ds = xr.concat([med.vv, med.vh, med.vh_vv], 'channel').rename('rgb').to_dataset().persist()
In [29]:
# Plot the RGB
rgb_plot = rgb_ds.hvplot.rgb(
    bands='channel',
    groupby='time', rasterize=True,
    geo=True, crs=ccrs.PlateCarree(), projection=ccrs.PlateCarree(),
    title='RGB', frame_height=500,
)

rgb_plot  # + vv_plot + vh_plot
Out[29]:

Export to Geotiffs¶

Recall that to write a dask dataset to a file requires the dataset to be .compute()ed. This may result in a large memory increase on your JupyterLab node if the area of interest is large enough, which in turn may kill the kernel. If so then skip this step, choose a smaller area or find a different way to export data.

In [ ]:
# Make a directory to save outputs to
target = Path.home() / 'output'
if not target.exists(): target.mkdir()

def write_band(ds, varname):
    """Write the variable name of the xarray dataset to a Geotiff files for each time layer"""
    for i in range(len(ds.time)):
        date = ds[varname].isel(time=i).time.dt.strftime('%Y%m%d').data
        single = ds[varname].isel(time=i).compute()
        write_cog(geo_im=single, fname=f'{target}/example_sentinel-1_{varname}_{date}.tif', overwrite=True)
        
write_band(data, 'vv')
write_band(data, 'vh')
# write_band(rgb_da, 'rgb')
In [ ]: